278
provides a wealth of information. It is important to remember that these are only func
tional annotations of elements. Some of the elements have only weak or no selection pres
sure. For a comparison between vertebrates including humans, the UCSC genome browser
is recommended (https://genome.ucsc.edu), which meanwhile compares a whole zoo of
different genomes with each other (https://genome-euro.ucsc.edu/cgi-bin/hgGateway), but
also includes information e.g. from the ENCODE project, such as methylation data, or
predictions by RepeatMasker, such as LINE.
How Can I Create a Phylogenetic Family Tree?
Phylogenetic trees provide an overview of functional and evolutionary relationships. A
number of software options have been described in the book for this purpose. It is impor
tant that even a simple program like CLUSTAL (https://www.ebi.ac.uk/Tools/msa/clust
alo/ [newest version: CLUSTAL omega]; https://www.genome.jp/tools/clustalw/
[somewhat older version, aligns pairwise sequences over their whole length quite fast and
draws a phylogenetic tree]) with experience brings better results (with CLUSTAL it is
important to take sequences of approximately the same length; in addition, depending on
the presumed evolutionary distance, one can correct with matrices here). The more com
plex softwares are correspondingly more complex to use. An example for accurate phylo
genetic tree analysis is the PHYLogeny Inference Package (PHYLIP; https://evolution.
genetics.washington.edu/phylip.html), which allows the construction of phylogenetic
trees from sequences based on various methods, such as parsimony, likelihood, and boot
strapping (see the website for detailed documentation). Another option is the software
MUSCLE (Multiple Sequence Comparison by Log-Expectation; https://www.drive5.com/
muscle/), which, in addition to multiple alignment, computes a phylogenetic tree based,
for example, on the methods UPGMA (Unweighted Pair Group Method with Arithmetic
Mean; fast method if there are many sequences) or Neighbor joining (better approximation
to the true tree, but slow if there are too many sequences). The results from MUSCLE can
also be saved in a format compatible with PHYLIP (Newick) and used there. Detailed
documentation on MUSCLE can be found on MUSCLE (https://www.drive5.com/muscle/
manual/) or on the EBI website (https://www.ebi.ac.uk/Tools/msa/muscle/help/).
19.2
RNA: Sequence, Structure Analysis and Control
of Gene Expression
How Do I Find and Analyze an RNA Sequence and Structure?
During transcription, an RNA is produced that has a secondary structure. One important
database is Rfam. It is easy to look up and use and gives an overview of different RNA
families including sequence and structure. There are different functional RNA classes,
such as miRNAs and lncRNAs, which have an impact on gene expression. Important data
bases include miRBase (https://www.mirbase.org/) and LNCipedia (https://www.lncipe
dia.org/), which provide specific information on sequence, structure and functional
19 Tutorial: An Overview of Important Databases and Programs